Anelia Angelova In Partial Fulfillment

نویسنده

  • Anelia Angelova
چکیده

Could a training example be detrimental to learning? Contrary to the common belief that more training data is needed for better generalization, we show that the learning algorithm might be better off when some training examples are discarded. In other words, the quality of the examples matters. We explore a general approach to identify examples that are troublesome for learning with a given model and exclude them from the training set in order to achieve better generalization. We term this process ’data pruning’. The method is targeted as a pre-learning step in order to obtain better data to train on. The approach consists in creating multiple semi-independent learners from the dataset each of which is influenced differently by individual examples. The multiple learners’ opinions about which example is difficult are arbitrated by an inference mechanism. Although, without guarantees of optimality, data pruning is shown to decrease the generalization error in experiments on real-life data. It is not assumed that the data or the noise can be modeled or that additional training examples are available. Data pruning is applied for obtaining visual category data with little supervision. In this setting the object data is contaminated with non-object examples. We show that a mechanism for pruning noisy datasets prior to learning can be very successful especially in the presence of large amount of contamination or when the algorithm is sensitive to noise. Our experiments demonstrate that data pruning can be worth while even if the algorithm has regularization capabilities or mechanisms to cope with noise and has a potential to be a more refined method for regularization or model selection.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved generator objectives for GANs

We present a framework to understand GAN training as alternating density ratio estimation, and approximate divergence minimization. This provides an interpretation for the mismatched GAN generator and discriminator objectives often used in practice, and explains the problem of poor sample diversity. Further, we derive a family of generator objectives that target arbitrary f -divergences without...

متن کامل

Experimental Results from a Terrain Adaptive Navigation System for Planetary Rovers

Results from the experimental testing of a navigation system for planetary rovers called Terrain Adaptive Navigation (TANav) are shown here. This system was designed to enable greater access to and more robust operations in terrains with widely varying slippage. The system achieves this goal by using onboard stereo cameras to remotely classify terrain, predict the slippage of that terrain, and ...

متن کامل

Slip Prediction Using Visual Information

This paper considers prediction of slip from a distance for wheeled ground robots using visual information as input. Large amounts of slippage which can occur on certain surfaces, such as sandy slopes, will negatively affect rover mobility. Therefore, obtaining information about slip before entering a particular terrain can be very useful for better planning and avoiding terrains with large sli...

متن کامل

Thesis Submitted in Partial Fulfillment of the requirement for the Degree of M.A/M. Sc In School consultant

Goal: The aim of this study is assess and compare emotional ability of deaf. Semi _ deaf and hearing students (14 _ 20) in Mashhad. Method: To do this experiment out of studies evidence   generally 105 students selecting randomly. From each group, choose the number of normal boys and girls 35, deaf boys and girls and semi deaf boys and girls .this article is useful and explanatory .in this stud...

متن کامل

A Batch-wise ATP Procedure in Hybrid Make-to-Order/Make-to-Stock Manufacturing Environment

Satisfying customer demand necessitates manufacturers understanding the importance of Available-To-Promise (ATP). It directly links available resources to customer orders and has significant impact on overall performance of a supply chain. In this paper, an improvement of the batch-mode ATP function in which the partial fulfillment of the orders is available will be proposed. In other words, in...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004